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REPRESENTATIONS  OF  QUASI-NEWTON  MATRICES 
AND  THEIR  USE  IN  LIMITED  MEMORY  METHODS 


by 

Richard  H.  Byrd,  Jorge  Nocedal  and  Robert  B.  Schnabel 


ABSTRACT 


We  derive  compact  representations  of  BFGS  and  symmetric  rank-one  matrices  for 
optimization.  These  representations  allow  us  to  efficiently  implement  limited  memory 
methods  for  large  constrained  optimization  problems.  In  particular,  we  discuss  how 
to  compute  projections  of  limited  memory  matrices  onto  subspaces.  We  also  present 
a  compact  representation  of  the  matrices  generated  by  Broyden’s  update  for  solving 
systems  of  nonlinear  equations. 


Key  words :  Quasi-Newton  method,  constrained  optimization,  limited  memory  method, 
large-scale  optimization. 

Abbreviated  title:  Representation  of  quasi-Newton  matrices. 


1.  Introduction. 

Limited  memory  quasi-Newton  methods  are  known  to  be  effective  techniques  for 
solving  certain  classes  of  large-scale  unconstrained  optimization  problems  (Buckley  and 
Le  Nir  (1983),  Liu  and  Nocedal  (1989),  Gilbert  and  Lemarechal  (1989))  .  They  make 
simple  approximations  of  Hessian  matrices,  which  are  often  good  enough  to  provide  a  fast 
rate  of  linear  convergence,  and  require  minimal  storage.  For  these  reasons  it  is  desirable 
to  use  limited  memory  approximations  also  for  solving  problems  that  include  constraints. 
However,  most  algorithms  for  constrained  optimization  require  the  projection  of  Hessian 
approximations  onto  the  subspace  of  active  constraints  and  other  matrix  calculations 
that  can  be  expensive  when  the  number  of  variables  is  large.  This  is  true  even  if  limited 
memory  approximations  are  used,  unless  special  care  is  taken  in  their  representation  and 
manipulation. 
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In  this  paper  we  derive  new  representations  of  limited  memory  quasi-Newton  matrices 
and  show  how  to  use  them  efficiently  in  the  kind  of  matrix  computations  required  in 
constrained  optimization  methods.  We  present  new  expressions  for  both  the  BFGS  and 
symmetric  rank-one  formulae  for  optimization,  and  also  derive  a  compact  expression  for 
Broyden’s  method  for  solving  systems  of  nonlinear  equations.  We  believe  that  these  new 
compact  representations  of  quasi-Newton  matrices  are  of  interest  in  their  own  right,  but 
in  this  paper  we  focus  on  their  use  in  limited  memory  methods. 

To  motivate  the  new  matrix  representations  we  begin  by  describing  the  limited  mem¬ 
ory  BFGS  method  for  unconstrained  optimization.  It  is  a  variation  of  the  standard  BFGS 
method,  which  is  given  by 


%k-\- 1  —  ^kHkgk  k  —  0,1,2,... 

(1.1) 

where  Xk  is  a  steplength,  gk  is  the  gradient  of  the  objective  function  /  :  Rn  — ►  R  at  xk, 
and  where  the  inverse  Hessian  approximation  Hk  is  updated  at  every  iteration  by  means 
of  the  formula 

Hk+\  14  HkVk  pksksk  , 

(1.2) 

where 

Pk  =  1/ yksk,  Vk  =  I  -  pkyksl , 

(1.3) 

and 

sk  =  xk+i  -  Xk,  yk  =  9k+i  ~  9k- 

(see  e.g.  Fletcher  (1987)).  We  say  that  the  matrix  Hk+ 1  is  obtained  by  updating  Hk 
using  the  pair  {sk,yk}- 

The  limited  memory  BFGS  method  is  an  adaptation  of  the  BFGS  method  to  large 
problems.  The  implementation  described  by  Liu  and  Nocedal  (1989)  is  almost  identical 
to  that  of  the  standard  BFGS  method  —  the  only  difference  is  in  the  matrix  update. 
Instead  of  storing  the  matrices  Hk,  one  stores  a  certain  number,  say  m,  of  pairs  {s{,  iji} 
that  define  them  implicitly.  The  product  Hkgk  is  obtained  by  performing  a  sequence  of 
inner  products  involving  gk  and  the  m  most  recent  vector  pairs  {s;,  yi}.  After  computing 
the  new  iterate,  the  oldest  pair  is  deleted  from  the  set  {si-,?/l},  and  is  replaced  by  the 
newest  one.  The  algorithm  therefore  always  keeps  the  m  most  recent  pairs  to 

define  the  iteration  matrix.  This  approach  is  suitable  for  large  problems  because  it  has 
been  observed  in  practice  that  small  values  of  m  (say  m  £  [3,  7])  give  satisfactory  results. 

Let  us  describe  the  updating  process  in  more  detail.  Suppose  that  the  current  iterate 
is  xk  and  that  we  have  stored  the  m  pairs  {.Sj,yt},  i  =  k  —  m,...,k  —  1.  We  choose  a 
'‘basic  matrix”  (usually  a  diagonal  matrix)  and  update  it  m  times  using  the  BFGS 
formula  and  the  m  pairs  i  =  k  -  m,  1.  From  (1.2)  we  see  that  IIk  can  be 

written  as 

Hk  =  (nT-i  '■■Vlm)Hl0)(Vk_m---Vk„1) 

+  Pk-m  (Vk~ i  •  •  ‘Vk_m+1^j  Sk_7nSk_m  (T4_m  +  1  •  •  44- 1) 
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+  Pk-m+1  (yH-i  •  •  •  V?_m+2)  Sk-rn+iSk_m+1  (Vk-m+2  '  •  •  Vk-l) 

+  ’ 

+  Pk- lSk-l&k-i-  (1.4) 

There  is  a  recursive  formula  (Nocedal  (1980))  that  takes  advantage  of  the  symmetry  of 
this  expression  to  compute  the  product  Hkgk  efficiently.  As  a  result,  the  computation  of 
the  search  direction  in  the  limited  memory  BFGS  method  for  unconstrained  optimization 
can  be  performed  very  economically. 

It  turns  out,  however,  that  in  two  respects  this  recursive  formula  is  much  less  eco¬ 
nomical  for  some  of  the  calculations  required  when  constraints  are  present.  First,  when 
the  constraints  are  sparse  the  recursion  does  not  take  good  advantage  of  this  sparsity. 
For  example,  if  e,-  is  a  unit  vector,  the  computation  of  Hke{  is  almost  as  expensive  as  the 
computation  of  Hkgk.  Second,  many  algorithms  for  constrained  optimization  require  the 
direct  Hessian  approximation,  Bk  =  H^1  instead  of  the  inverse  BFGS  approximation, 
Hk.  However,  there  appears  to  be  no  analogous  recursion  for  the  Hessian  approximation 
Bk  and,  as  pointed  out  in  Section  4.2,  a  straightforward  implementation  turns  out  to  be 
quite  costly. 

After  deriving  our  new  quasi-Newton  representations  in  Section  2,  we  show  in  Sec¬ 
tion  3  how  they  can  be  used  in  limited  memory  methods  in  a  way  that  is  efficient  for 
unconstrained  optimization,  and  gets  around  both  of  these  difficulties  in  constrained 
optimization  calculations. 


Notation.  The  number  of  variables  in  the  optimization  problem  is  n,  and  the  number 
of  correction  pairs  used  in  the  limited  memory  methods  is  m.  The  Hessian  approxi¬ 
mation  is  denoted  by  Bk,  and  the  inverse  Hessian  approximation  is  Hk.  The  z'-th  unit 
vector  is  written  as  et-.  A  diagonal  matrix  with  diagonal  elements  0i,  ...,0n  is  denoted  by 
diag[#i, ...,  9n]. 

2.  Compact  Representations  of  BFGS  Matrices 

We  will  now  describe  new  representations  of  the  inverse  and  direct  BFGS  matrices, 
and  show  how  to  compute  several  types  of  matrix-vector  products  efficiently.  In  this 
section  we  will  consider  the  updating  process  in  a  general  setting,  and  will  not  restrict  it 
to  the  case  of  limited  memory  methods. 

Let  us  define  the  n  x  k  matrices  Sk  and  Yk  by 

Sk  —  [-Sq,  ...  ,  «A:-l]  >  Yk  =  [j/O »  •  •  •  ■>  Vk-l]  •  (2-1) 

We  first  prove  a  preliminary  lemma  on  products  of  projection  matrices  that  will  be  useful 
in  subsequent  analysis  and  is  also  interesting  in  its  own  right. 
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Lemma  2.1  The  product  of  a  set  of  k  projection  matrices  of  the  form  (1.3)  satisfies 

V0'  ■  -  Vk-i  —  I  —  YkRklSf,  (2.2) 

where  Rk  is  the  k  X  k  matrix 


(Rk)i,j 


sf-iVj-i  if  i<j 
0  otherwise 


(2.3) 


Proof.  Proceeding  by  induction  we  note  that  (2.2)  holds  for  k  =  1,  because  in  this  case 
the  right  hand  side  of  (2.2)  is 

1  T 

I  yo  <Sn  —  Po. 


•55  2/0 


(2.4) 


Now  we  assume  that  (2.2)  holds  for  some  k ,  and  consider  k  +  1.  If  we  write  the  matrix 
Rk^.\  as 

Rk  S%yk 


Rk+l 


0  f 

Pk 


we  see  that 


*I+i  = 


Rk  PkRklS^yk 
0  pk 


This  implies  that 

I-YmR^S^  =  I-\Ykyk 


RP  -PkR^Slyb 


=  I-  YhR?S?  +  pkYkRpShkSTk  - Pkyksl 
=  (I-YkRk'STk){I-pkyksl). 

Using  this  with  the  inductive  hypothesis  of  (2.2)  we  have  that 


Pk 

-1  C  T 


(2.5) 


i - 

to 

1 _ 

L  Sk  J 

V0---Vk  =  (I-YkR-^SW-PkyksI) 

=  (/-iwiiltf+l). 

which  establishes  the  product  relation  (2.2)  for  all  k.  □ 

It  should  be  pointed  out  that  this  lemma  holds  for  the  product  of  any  sequence  of 
projections  onto  spaces  of  dimension  n  -  1  and  is  a  useful  but  little-known  result.  Es¬ 
sentially  the  same  result  is  also  mentioned  by  Walker  (1988)  in  the  context  of  products 
of  Householder  transformations.  The  lemma  can  be  generalized  to  projections  onto  sub¬ 
spaces  of  arbitrary  and  different  dimensions,  in  which  case  the  matrix  Rk  becomes  block 
upper  triangular. 

The  following  theorem  gives  a  compact  representation  of  the  matrix  Hk  obtained  after 
k  BFGS  updates.  We  will  later  see  that  this  representation  is  often  more  convenient  than 
(1.4). 
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Theorem  2.2  Let  Hq  be  symmetric  and  positive  definite  and  assume  that  the  k  pairs 
{siiVi}i=o  satisfy  sf  yi  >  0.  Let  Hk  be  obtained  by  updating  Hq  k  times  using  the  inverse 
BFGS  formula  (1.2)  and  the  pairs  {s{,yi}^~g.  Then 

[  Rf{Dk  +  Y^H0Yk)R-p  -Rf  1[  Sj 
Hk  =  H0  +  Sk  H0Yk  j  ,  (2.6) 

L  -V  o  Jl^o. 

where  Rk  is  as  given  in  (2.3)  and  Dk  is  the  k  x  k  diagonal  matrix 

Dk  =  diag  5^o,.  •  .  (2.7) 


Proof.  We  write  the  BFGS  formula  (1.2)  as 

Hk  —  Mk  +  Nk,  k  >  1 
where  Mk  and  Nk  are  defined  recursively  by 


Mq  =  Hq 

Afib+i  =  Vf  MkVk, 


(2.9) 

(2.10) 


(  Ni=PoSo*£ 

\  Nk+1  =  V?Nk\ 4  +  pkskST.  (2-10) 

First  note,  from  the  definition  of  Mk  and  (2.2),  that 

Mk  =  (v?_1---V?)Ho(V0---Vls-1) 

=  (/  -  SkR-jY?)H0(I  -YkRpSj).  (2.11) 

Next,  we  will  show  by  induction  that 

Nk  =  SkR-kTDkR-klSl.  (2.12) 

This  is  true  for  k  =  1,  for  in  this  case  the  right  hand  side  of  (2.12)  is  PoSoSq  ,  which  equals 
IVj.  Now  let  us  assume  that  (2.12)  is  true  for  k.  Then,  by  the  definition  (2.10)  of  N, 

Nk+i  =  Vf  SkRfTDkR-klSlVk  +  PksksTk.  (2.13) 

To  simplify  this  expression,  we  note  from  (1.3)  and  (2.5)  that 

RklSTkVk  =  RflSk(I  -  pkVks'k) 


r  si 

’  v 

1 

?r 

kf 

i 

[  4 

’  R3 

-pkRklSlyk 

STm 

I  0 

n  —  1  cT 

(2.14) 
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Also,  using  (2.5)  we  can  write  sk  as 


-T  1 

Sk+lRjc+iCk+l  ■ 

pk 


(2.15) 


Substituting  this  and  (2.14)  in  (2.13),  we  have 


Nk+l 


Dk[l  o]R£1sZ+1  +  Sk+1Rg1 


Pk 


n  —  1  nT 
^k+l^k+l 


Sfc+l  Rk+l 


Dk  0 

0 

L 

— T  n  d— 1 


_1_ 

Pk  J 


0  —  1  qT 
Itk+lDk+l 


Sk+i  Rk+i  Dk+i  Rk+i  $k+i  • 

This  proves  (2.12)  for  k  +  1. 

Finally  by  expanding  the  expression 

RkT(Dk  +  Y^H0Yk)R^ 


Hn  + 


Sk  H0Yk 


-R 


-T 


R~kl 


1 - 

to 

i _ 

.  YkHo  . 

we  see  that  it  is  equal  to  Mk  +  Nk,  where  Mk  and  Nk  are  given  by  (2.11)  and  (2.12). 


□ 


Note  that  the  conditions  sj yi  >0  i  —  0, ...,  k  -  1  ensure  that  Rk  is  nonsingular,  so  that 
(2.6)  is  well  defined.  Indeed  it  is  well  known  (Fletcher  (1987))  that  the  BFGS  formula 
preserves  positive  definiteness  if  sf  y{  >  0  for  all  i. 

Theorem  2.2  gives  us  a  matrix  representation  of  the  inverse  Hessian  approximation 
Hk.  We  now  present  an  analogous  expression  for  the  direct  Hessian  approximation  Bk. 
The  direct  BFGS  update  formula,  i.e.  the  inverse  of  (1.2)  is  given  by 


Bk+ 1  =  Bk-  ?-kpYBk  +  M* 

sk  Bksk 


vhk 


(2.16) 


Theorem  2.3  Let  B0  be  symmetric  and  positive  definite  and  assume  that  the  k  pairs 
SiiViSi— o  sutosfy  Vi  >  0.  Let  Bk  be  obtained  by  updating  Bq  k  times  using  the  direct 
BFGS  formula  (2.16)  and  the  pairs  {s;,  .  Then 


(2.17) 


Bk  =  B0-  B0Sk  Yk 


SfBoSk  Lk 

-1 

'  STkB0  ' 

Y?  J 

where  Lk  is  the  k  x  k  matrix 


{Lk)tJ  =  { FBI 


otherwise 


(2.18) 
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Proof.  Let  us  write  (2.6)  as 

Hk  =  H0+  UkCkU J, 

where 


Uk 


Sk  HoYk  , 


and 


n  =  [  +  YkT H0Yh)R?  -Rf 

[  -Rl1  o 

By  direct  multiplication  we  can  verify  that  the  inverse  of  Ck  is 


(2.19) 


0  -R) t 

~Rk  —(Dk  +  Yk  H0Yk) 


(2.20) 


Applying  the  Sherman-Morrison- Woodbury  formula  (Ortega  and  Rheinboldt  (1970))  to 
(2.19)  we  obtain 


Now 


Bk  =  Bo-BoU^I+CkUtBoUkr'CkUtBo 

=  Bo-BoUkiC^  +  UfBoUkT'uZBo. 


VtBoUk  = 


Bo  [  Sk  H0Yk 


SfB0Sk  SkYk 
Yk  Sk  Y?HaYk 


Therefore  using  (2.20) 


(2.21) 


Ck'  +  U?B0Uk  = 


SlB0Sk  SkYk  -  Rk 
Y? Sk  -  Rl  -Dk 


Note  that  the  matrix  Lk  defined  by  (2.18)  can  be  written  as 


Lk  —  S^Yk  —  Rk, 


(2.22) 


so  that 


Ckl  +  Uk  B0Uk  = 


S{B0Sk  Lk 
Ll  -Dk 


Substituting  this  into  (2.21)  we  obtain  (2.17). 


(2.23) 

□ 


In  the  next  sections  we  will  show  that  the  new  formulae  (2.17)  and  (2.6),  which  at  first 
appear  rather  cumbersome,  are  actually  very  convenient  for  some  calculations  arising  in 
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constrained  optimization.  Before  doing  so  we  make  a  remark  concerning  the  implemen¬ 
tation  of  (2.17). 

The  middle  matrix  in  (2.17), 


SjB0Sk  Lk 

Lk  ~Dk 


(2.24) 


is  indefinite.  However  we  now  show  that  its  inversion  can  be  carried  out  using  the 
Cholesky  factorization  of  a  related  matrix.  First  we  re-order  the  blocks  of  (2.24)  and 
note  that 


-Dk  Lj 
Lk  SfBoSk 


d\' 2  0 

-LkDk112  Jk 


-D\'2 

0 


where  Jk  is  the  lower  triangular  matrix  that  satisfies 


JkJl  =  S%B0Sk  +  LkDpLTk. 


The  following  result  shows  that  Jk  exists  and  is  nonsingular. 


(2.25) 


(2.26) 


Theorem  2.4  If  Bq  is  positive  definite  and  sj y{  >  0 ,i  =  0 1,  then  the  matrix 
Sk  BoSk  +  LkDf1  is  positive  definite. 


Proof.  From  the  definition  (2.7)  we  see  that  Dk  is  positive  definite  and  hence  S J B0Sk  + 
LkDf1  LTk  is  positive  semi-definite.  Suppose  that  uT(S^  B0Sk  +  LkDfl  L^)u  =  0  for  some 
vector  u.  Then  L^u  =  0  and  Sku  =  0,  which  in  turn  implies  that  SkU  —  0.  Recalling 
(2.22)  we  have  Y^f  Sk  =  L ^  +  R so  that  R^u  =  0.  Since  Rj!  is  triangular  with  positive 
diagonal,  we  conclude  that  u  =  0. 

□ 

Therefore,  only  the  Cholesky  factorization  of  the  k  X  k  symmetric  positive  definite  matrix 
SkB0Sk  +  LkDflLl  needs  to  be  computed,  to  implement  (2.17).  This  is  preferable  to 
factorizing  the  indefinite  2 k  x  2 k  matrix  (2.24).  We  will  discuss  the  implementation  of 
(2.17)  in  more  detail  in  section  3.2,  in  the  context  of  limited  memory  methods. 


3.  Application  to  the  Limited  Memory  Method. 

Since  we  know  that  k  BFGS  updates  can  be  written  in  the  compact  forms  (2.6)  and 
(2.17),  it  is  easy  to  describe  a  limited  memory  implementation.  We  keep  the  m  most 
recent  correction  pairs  {s,-,?/,}  to  implicitly  define  the  iteration  matrix.  This  set  of  pairs 
is  refreshed  at  every  iteration  by  removing  the  oldest  pair  and  adding  a  newly  generated 
pair.  We  assume  that  m  is  constant,  but  it  is  not  difficult  to  adapt  all  the  formulae  of 
this  section  to  the  case  when  m  changes  at  every  iteration. 

Suppose  that  at  the  current  iterate  Xk  we  wish  to  construct  the  inverse  limited  memory 
BFGS  matrix  Hk.  We  do  so  by  implicitly  updating  hI°\  the  basic  matrix,  m  times  using 


the  2 m  vectors  {sk-m,  •  •  • ,  Sk-i}  and  {yk..m, . . . ,  yfc-i},  which  have  been  saved.  Let  us 
assume  that  H ^  =  7 kI,  for  some  positive  scalar  7*.  From  (2.6)  we  see  that  the  resulting 
matrix  is 


R-kT{Dk  +  lkY^Yk)R^  -Rf 

Hk  =  Ikl  +  |  Sk  ikYk 

■R~kl  0 

where  now 

Sk  =  5  •  •  •  5  5A:-l]  ?  hfe  =  [j/i-m)  •  •  •  5  Vk-l]  ? 

and  where  Rk  and  Dk  are  the  m  x  m  matrices 


'  s?  • 

.  7 kYkT  _ 

(3.1) 


(3.2) 


( Rk)i,j  — 


($k— m  — 1+0  (2/fe— m— 1+j  )  if  2*  fC  j 
0  otherwise  ’ 


(3.3) 


and 

Dk  =  diag\sl_myk-m,---1Sk-1yk-i\  -  (3.4) 

After  the  new  iterate  Xfc+i  is  generated,  we  obtain  S-k+i  by  deleting  Sk-m  from  Sk  and 
adding  the  new  displacement  sk.  The  matrix  Yk+ 1  is  updated  in  the  same  fashion. 

This  describes  the  general  step  when  k  >  m.  For  the  first  few  iterations,  when  k  <  m, 
we  need  only  replace  m  by  k  in  the  formulae  above.  We  have  assumed  that  =  7 kI 
because  this  choice  is  common  in  practice  (see  Gilbert  and  Lemarechal  (1989)  and  Liu 
and  Nocedal  (1989)).  Other  formulae  for  the  initial  matrix  could  also  be  used,  but  would 
probably  result  in  a  more  expensive  computation. 

A  limited  memory  matrix  based  on  the  direct  BFGS  formula  is  also  easily  obtained. 
Let  the  basic  matrix  be  of  the  form  B [°^  =  07-/,  for  some  positive  scalar  07.  From 
(2.17)  we  see  that  if  we  update  B ^  m  times  using  the  vectors  {sk-m, . . .,  sk-i}  and 
{Vk-m,  ...,yk-i},  we  obtain 


R  k  —  &kl  ®kSk  Yk 


VkSZSk 

Lk 

-1 

okST 

LI 

~Dk  _ 

Y?  J 

where  Sk,Yk,Dk  are  given  by  (3.2)  and  (3.4),  and  where  Lk  is  defined  by 


(Lk)ij 


Sk-m-l+iVk-m-l+j  if  *  >  j 
0  otherwise 


(3.5) 


(3.6) 


We  now  describe  procedures  for  performing  computations  with  these  compact  repre¬ 
sentations  of  limited  memory  BFGS  matrices. 


3.1.  Computations  involving  Hk. 
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We  consider  several  products  involving  the  inverse  limited  memory  matrix  Hk.  To 
save  computations  we  will  store,  in  addition  to  the  two  n  x  m  matrices  Sk  and  Yk,  the 
m  X  m  matrices  Y^Yk,  Rk,  and  Dk.  Since  in  practice  m  is  very  small,  say  m  <  7,  the 
storage  space  required  by  these  three  auxiliary  matrices  is  negligible.  In  the  operation 
counts  given  below  we  concentrate  on  multiplications  since  the  arithmetic  consists  pri¬ 
marily  of  inner  products,  so  that  the  number  of  additions  is  similar  to  the  number  of 
multiplications.  We  note  that  for  the  rest  of  this  section  Sk,Yk,  Rk,  Dk,  Lk  are  defined 
by  (3.2)-(3.4)  and  (3.6). 

Computation  of  Hkgk • 

This  product  defines  the  search  direction  in  a  limited  memory  method  for  unconstrained 
optimization.  Since  some  of  the  work  of  computing  the  product  Hkgk  occurs  also  in  the 
update  of  Hk ,  it  is  efficient  to  consider  both  operations  together. 

At  the  &-th  iteration  of  the  limited  memory  algorithm  for  unconstrained  optimization 
we  must  update  our  representation  of  Hk-i  to  get  Hk,  compute  the  search  direction 
~Hkgk  and  perform  a  line  search.  To  update  Hk- 1  we  delete  a  column  from  and  add 
a  new  column  to  each  of  the  matrices  Sk- 1  and  Yk- 1,  and  make  corresponding  updates 
to  Rk~ i,  Y^Yk-i  and  Dk-\.  We  will  show  that  these  updates  can  be  done  in  0(m2) 
operations  by  storing  a  small  amount  of  additional  information.  For  example,  from  (3.3) 
we  see  that  the  new  triangular  matrix  Rk  is  formed  from  Rk-i  by  deleting  the  first  row 
and  column,  adding  a  new  column  on  the  right,  which  is  given  by 

SkVk-i  =  Sl{gk  -  gk- i),  (3.7) 

and  adding  a  new  row  on  the  bottom,  which  is  zero  in  its  first  m  —  1  components.  It  would 
appear  that  this  requires  mn  multiplications.  However,  note  from  (3.1)  that  the  vector 
Sj-Qk  and  the  first  m  —  1  components  of  S^gk- 1  have  to  be  calculated  in  the  process  of 
computing  Hkgk  and  Hk-igk-i-  Thus  we  may  save  the  first  m—  1  components  of  S^gk- 1 
from  the  previous  iteration,  and  we  need  only  compute  sl_1gk- 1,  which  can  be  obtained 
with  0(m2)  work,  as  we  will  show  below.  Thus  to  compute  S%yk-i  by  the  difference 
(3.7)  will  require  only  0(m2)  operations.  The  matrix  Y?Yk  can  be  updated  in  a  similar 
way  saving  another  mn  multiplications. 

An  updating  process  that  implements  these  savings  in  computation  is  as  follows.  At 
xk ,  the  following  data  has  been  saved  from  the  previous  iteration: 


9k-i9k-i, 


sj 9k- l 

i  =  k  —  m  —  1 , . . 

-2, 

(he.  Sk-i9k-i) 

yj  9k- 1 

rH 

1 

s 

1 

II 

•  ^ 

-2 

(he.  Yl_igk~i). 

Now  we  compute  the  quantities  corresponding  to  the  present  iteration.  We  begin  with 

T  T 

sk-l9k-l  =  —^k-\9k-\Rk-l9k-l, 
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which  by  (3.1)  is  equal  to 


Rk-\(Dk-i+7k-\YkLiYk-\)Rk-\  ~R-k-i 

-Xk-i7k-igk-i9k-i-h-iwk  wk  (3.8) 

0 

where 

Sk-i9k-i 

wk  - 

.  Ik-iY^gk-i 

This  requires  only  0(m 2)  operations  since  g^gk-i,  and  have  already 

been  saved  from  the  previous  iteration. 

Next  we  compute  the  inner  products 

9k  9k, 

9k  i  =  k  -  m, . .  .,k  -  1,  (i.e.  Sj^) 

and 

*  =  fc  -  m,  ...,Ar- 1,  (i.e.  Yk  gk). 

With  this  information,  the  new  components  of  RklY^Yk  and  D k,  can  be  computed  in 
0(m)  work  by  the  formulae 

sfvk- 1  =  sfgk-sfgk-i  i  =  k  -  -  1,  (3.9) 

^2/fc-i  =  yjgk-yjgk-i  i  =  k-  2,  (3.10) 

wf-llte-l  =  ~9k  9k  +  2((jr*.  —  9k-l)T gk  4-  g]c-l9k-l-  (3.11) 

We  now  give  a  complete  description  of  the  procedure  for  updating  Hk  and  computing 

Hk9k- 


Algorithm  3.1  (Step  Computation  for  Unconstrained  Minimization) 

Let  xk  be  the  current  iterate.  Given  sk-uyk-u  gk,  the  matrices  Sk- 1,  l^-i,  Rk- 1, 
Yk~\ Yk-u  Dk-u  the  vectors  S^^gk- 1,  Y^_1gk_1  and  the  scalar  g^_lgk_1: 

1.  Update  Sk,Yk 

2.  Compute  g?gk,  S J gk,  and  Yjf  gk 

3.  Compute  s^g^  by  (3.8) 

4.  Update  Rk,Yf  Yk  and  Dk  with  the  aid  of  (3.9)-(3.11). 

5.  Compute  for  example 

jk  =  yk-i  Tsk-i  /  yk~i  Tyk-\ •  (3.12) 
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6.  Compute 

„  _  [  +  lkY? Yk)R-k\Slgk)  -  7 kRkT(Y?gk) 

P  l  -Rk\sh*)- 

7.  Compute 

fhgk  =  ikgk  +  Sk  7 kYk  p. 

In  this  procedure,  step  2  requires  (2m  +  l)n  multiplications;  step  7  requires  (2m  +  l)n 
multiplications;  step  5  depends  on  the  formula  used  for  jk  (the  choice  (3.12)  is  free  since 
both  inner  products  have  been  stored);  all  other  steps  cost  at  most  0(m 2)  multiplications, 
for  a  total  of  (4m  +  2)n  +  0(m2)  multiplications.  Note,  however,  that  when  this  procedure 
is  part  of  an  algorithm  using  a  line  search  procedure,  the  scalar  is  also  required 

for  the  line  search,  whereas  g]:  gk  is  likely  to  be  needed  to  check  the  stopping  conditions  of 
the  algorithm.  Therefore  the  amount  of  extra  work  required  to  update  Hk  and  compute 
the  step  direction  is  4mn  +  0(m2)  in  that  case.  Of  course  for  large  problems  the  term 
4mn  predominates. 

As  will  be  seen  in  Section  4.1  this  is  the  same  amount  of  work  per  iteration  as  required 
by  the  two-loop  recursion  described  by  Nocedal  (1980),  and  as  far  as  we  know  there  is 
no  more  efficient  way  to  implement  the  unconstrained  limited  memory  BFGS  method. 
Thus  the  two  approaches  are  equally  efficient  for  unconstrained  problems,  but,  as  pointed 
out  in  Section  4.1,  the  compact  matrix  representations  derived  in  this  paper  are  more 
economical  when  computing  certain  quantities  arising  in  sparse  constrained  optimization 
calculations. 

The  product  Hkv. 

Let  us  consider  the  computation  of  the  product  Hkv ,  where  v  is  an  arbitrary  vector. 
From  (3.1)  we  see  that  this  product  is  given  by 

[  Rf(Dk  +  lkY^Yk)R^  -RkT]\  Si V 

Hkv  =  jkv  +  [  Sk  -fkYk  j  .  (3.13) 

[  -Rk1  0  J  L  ^kYk  v  . 

To  carry  out  the  computation  we  first  compute  the  products  S^v  and  Y^v,  which  together 
require  2 mn  multiplications.  To  multiply  the  resulting  2 m  vector  by  the  middle  2m  x  2m 
matrix  involves  3  solutions  of  triangular  systems  and  one  multiplication  by  an  m  x  m 
matrix.  Finally,  it  takes  2 mn  multiplications  to  multiply  [Sk  7 kYk]  with  the  resulting 
2m  vector.  Thus,  if  we  include  the  product  ^kv  and  ignore  0(m)  operations,  the  whole 
computation  requires  (4m  -j-  l)n  -j-  |m2  multiplications. 

Products  of  the  form  vT Hkv  and  uT Hkv. 

Consider  the  weighted  scalar  product  vT Hkv  where  v  is  an  arbitrary  vector,  and  where 
we  assume  that  the  vector  Hkv  is  not  needed.  Using  (3.1)  we  have 

vTHkv  =  lkvTv  +  (RZ1  Slv)T(Dk  +  ~ikY?Yk)(R-kl S{v)  -  2lkvTYkRp Sjv.  (3.14) 
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We  first  compute  S%v  and  Y£ v,  which  requires  2 ran  multiplications.  Next  we  solve  a 
triangular  system  to  get  R^S^v,  which  we  save,  multiply  by  the  matrix  Dk  +  7 Y%Yk, 
compute  vTv  and  do  some  order  ra  inner  products.  Thus  the  total  cost  of  this  compu¬ 
tation  is  (2m  +  1  )n  +  fra2  +  0(ra):  roughly  half  of  what  the  cost  would  be  if  we  first 
computed  Hkv  and  then  vT  Hkv. 

If  we  wish  to  compute  the  product  uT Hkv  for  two  arbitrary  vectors  u  and  v  the  cost 
is  more,  since 

urHkv  =  lkKTv  +  (Rl'Slu)T(Dk  +  lkY?Yk){R?Slv)--lkuTYkRllSlv 
— 'tkuT  SkRkTYk  v 

can  be  seen  to  require  (4ra  +  l)n  +  2 ra2  +  0(ra)  multiplications.  This  is  only  slightly  less 
expensive  than  computing  Hkv  and  then  taking  the  inner  product  of  the  result  with  u, 
which  would  cost  (4m  +  2 )n  -j-  0(ra2)  multiplications. 

The  Product  ATHkA. 

A  related  computation  is  the  problem  of  computing  the  matrix  AT HkA  where  A  is  an 
nxt  matrix  with  t  <  n.  This  computation  occurs  when  solving  the  constrained  nonlinear 
optimization  problem, 

minimize  f(x)  (3.15) 

subject  to  c(x)  =  0  (3.16) 

with  n  variables  and  t  constraints.  This  problem  is  frequently  solved  by  the  sequential 
quadratic  programming  method,  which  at  every  iteration  solves  a  subproblem  of  the  form 

minimize  gkd+±dTBkd  (3-17) 

subject  to  Aid  =  -ck,  (3.18) 

where  Ak  is  the  matrix  of  constraint  gradients  at  the  current  iterate  xk,  ck  is  a  vector 
of  length  t,  and  Bk  =  is  an  approximation  to  the  Hessian  of  the  Lagrangian  of  the 
problem.  If  Ak  has  full  rank,  the  solution  to  (3.17)-(3.18)  can  be  expressed  as 

d  =  -Hk(gk  +  Ak  A)  (3.19) 

where  the  Lagrange  multiplier  A  satisfies 


(AlHkAk)  A  =  -AlHkgk  +  ck.  (3.20) 

Let  us  suppose  that  Hk  is  a  limited  memory  matrix  represented  in  the  compact  form 
(3.1).  Then  the  matrix  AllIkAk  may  be  efficiently  computed  by  first  computing  Si Ak 
and  YU Ak,  which  require  2 mnt  multiplications,  then  R^lSlAk,  requiring  \m2t  multi¬ 
plications,  and  then  computing 

lkATkAk  +  {R^SlAkfiDk  +  -tkY?Yk)(R-k'SlAk)  -  2lkATkYkR~kl  Ak,  (3.21) 
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which  requires  m2t  +  fmt2  +  \{t2  +t)n  +  0((m<ix{m,t})2)  multiplications.  Ignoring  lower 
order  terms,  this  is  a  total  of 

3 

(2m  +  +  \)tn  +  -(m  +  t)mt 

multiplications.  As  long  as  m  and  t  are  fairly  small  this  is  not  extremely  expensive  and 
is  much  less  than  the  cost  of  computing  the  matrix  HkAk  first,  and  then  multiplying 
by  A%.  To  solve  (3.20)  requires  the  Cholesky  factorization  of  A^HkAk  which  requires 
It 3  multiplications.  The  other  matrix  vector  products  required  in  (3.19)  and  (3.20)  cost 
about  (2 1  -f  4m)n,  if  certain  quantities  computed  in  other  parts  of  the  procedure  are 
saved  and  reused  appropriately. 

Sparse  computations  with  Hk 

We  now  consider  computations  similar  to  those  in  the  previous  section  but  where  the 
vectors  and  matrices  multiplying  Hk  are  sparse.  This  is  an  important  case  because,  even 
though  gk,Sk ,  and  Yk  are  not  likely  to  be  sparse,  it  is  very  common  to  have  constrained 
optimization  problems  where  the  gradients  of  the  constraints,  and  thus  the  matrix  A 
in  (3.18)  are  sparse.  A  special  case  in  which  we  are  very  interested  is  the  case  of  a 
minimization  subject  to  bound  constraints,  where  the  matrices  dealt  with  are  actually 
submatrices  of  the  identity.  Significant  reductions  in  computational  cost  result  in  such 
problems  if  efficient  sparse  storage  is  used. 

The  product  Hkei  requires  2 mn  +  0(m2)  multiplications.  This  is  easy  to  see  from 
(3.13),  since  S^e{  and  Yjf  et-  require  only  0(m)  indexing  operations.  For  the  same  reason, 
we  see  from  (3.14)  that  ef  Hke{  can  be  computed  with  0(m 2)  multiplications. 

Consider  now  ATHkA  in  the  case  where  A  is  an  n  x  t  sparse  matrix  with  nA  non¬ 
zeros.  We  perform  this  computation  by  (3.21).  The  products  A  and  Yf  A  together 
require  2 mnA  multiplications.  The  back-solve  R^S^A  requires  \mt2  multiplications, 
and  the  rest  of  the  operations  require  2 mt2  +  m2t  +  0((max{m,t})2  multiplications 
plus  the  operations  of  AT A  which  cost  at  most  tnA  multiplications.  Thus  the  total  is 
0(max{m,t})nA  -f  (2 1  +  | m)mt  -j-  0((max{m,  t})2)  .  Thus  we  see  that,  while  in  the 
previous  section  the  computational  effort  in  most  tasks  was  roughly  proportional  to  the 
number  of  variables  n ,  in  the  sparse  case  it  is  proportional  to  the  number  of  non-zeros  in 
the  sparse  array  under  consideration. 


3.2.  Operations  with  Bk 

We  now  consider  the  direct  Hessian  approximation  Bk.  To  take  advantage  of  the 
decomposition  (2.25),  we  rewrite  (3.5)  as 


[  Yk  akSk  ] 

\-dI/2  D-k^q  1 

-1 

D\/2  0  ’ 

-1 

1 - 

w 

I _ 

o 

.  -LkDZ112  Jk  . 

.  °kSl  _ 

(3.22) 
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where  Jk  is  defined  by  (2.26).  We  use  this  expression,  both  in  the  sparse  and  dense  case, 
to  compute  several  products  involving  Bk . 

Update  of  Bk  and  the  product  BkV. 

This  computation  is  required  when  applying  limited  memory  methods  to  solve  con¬ 
strained  optimization  problem.  It  occurs,  for  example,  in  the  algorithm  for  nonlinearly 
constrained  problems  developed  by  Mahidhara  and  Lasdon  (1990),  and  in  the  primal 
limited  memory  algorithm  for  bound  constrained  optimization  described  by  Byrd,  Lu 
and  Nocedal  (1992). 

The  following  procedure,  which  is  based  on  the  representation  (3.22),  describes  in 
detail  the  k- th  step  of  an  iteration  that  first  updates  Bk  and  then  computes  the  product 
BkV  for  an  arbitrary  vector  v. 

Algorithm  3.2 

Let  xk  be  the  current  iterate,  and  assume  that  the  matrices  Sk- 1,  Yk-i,  Lk-i, 
Sk-iSk-u  and  Dk-i  have  been  stored.  The  vectors  sk-i,yk-i  have  just  been  computed, 
and  the  vector  v  is  given. 


1.  Obtain  Sk,Yk ,  by  updating  Sk- i  and  Yk-\ . 

2.  Compute  Lk^S^Sk  and  Dk . 

3.  Compute  for  example 


<?k  =  yk-iSk-i/sk-iTSk-i. 


(3.23) 


4.  Compute  the  Cholesky  factorization  of  akS^Sk  +  LkD^L^  to  obtain  JkJ{ . 

5.  Compute 


P  = 


Ykv 

crkSTv 


6.  Perform  a  forward 

V  ■  = 


7.  Compute 


and  then  a  backward  solve  to  obtain 


f  -dT  dz1/2lt  ' 

-l 

D\'2  0  ' 

'  o 

.  -LkD~k112  Jk  _ 

Bkv  =  akv  - 

Yk 

VkSl  \ 

P • 

The  first  step  of  this  procedure,  in  which  the  oldest  columns  of  the  matrices  Sk-\ , 
Yk-i  are  replaced  by  the  vectors  Sk- 1,  and  yk-u  does  not  require  any  arithmetic.  Step 
2  requires  2 m  inner  products  to  form  the  new  columns  of  matrices  Lk,  S%Sk  and  Dk, 
which  cost  <lvnn  multiplications.  The  choice  of  Uk  in  step  3  costs  only  one  multiplication 
since  both  yk-iT $k-i  and  Sk-iT sk-i  have  been  calculated  in  step  2.  In  step  4  the 
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Cholesky  factorization  of  the  positive  definite  matrix  akSkSk  +  LkDklLl  costs  0(m3) 
multiplications.  Step  5  costs  2 mn  multiplications.  The  forward  and  the  backward  solves 
of  2m x 2m  triangular  systems  in  step  6  cost  0(m2)  multiplications.  Step  7  costs  (2m+l)n 
multiplications.  In  summary,  this  procedure  costs  2mn-j-0(m3)  multiplications  from  step 
1  to  step  4,  where  the  matrix  Bk  is  defined;  and  costs  (4m  +  l)n-{-0(m2)  multiplications 
from  step  5  to  step  7,  where  the  product  BkV  is  calculated. 

The  weighted  scalar  product  vT BkV. 

This  product  occurs,  for  example,  in  the  conjugate  gradient  inner-iteration  as  well  as 
in  the  Cauchy  point  computation  of  the  primal  algorithm  described  by  Byrd,  Lu  and 
Nocedal  (1992).  Using  (3.22)  we  have 


vT  BkV  —  <JkVTv  —  vTWk 


\-dV 2  DZ^Ll  ] 

-1 

1 - 

o 

<M 

H  -V 

1 _ 

1 - 

o . 

1 _ 

.  -LkD~k112  Jk  J 

-l 

Wkv, 


(3.24) 


where 


We  first  compute  and  store  the  matrix  vector  products  Yfv,crkS%v,  which  determine 
Wkv,  and  which  require  2 mn  multiplications.  Then  we  solve  two  2m  x  2m  triangular 
systems,  and  compute  the  scalar  product  of  two  2m-vectors;  all  of  these  cost  at  most 
0(m  )  multiplications.  The  last  part  is  to  compute  UkV^v,  and  subtract  the  previously 
computed  scalar  from  it.  The  total  cost  of  this  computation  is  (2m  +  l)n  +  0(m2) 
multiplications.  Of  course  in  the  case  v  =  gk,  which  is  often  required,  using  previously 
computed  quantities  form  the  computation  of  Hk  would  allow  this  to  be  reduced  to 
0(m2). 

Sparse  computations  with  Bk 

Calculations  involving  the  product  of  Bk  and  sparse  vectors  involve  savings  similar  to 
those  involving  Hk;  for  example,  computing  Bke{  requires  2mra  +  0(m3)  multiplications. 
A  special  but  important  sparse  case  concerns  minimization  problems  subject  to  bound 
constraints,  in  which  the  constraint  gradients  are  submatrices  of  the  identity  matrix. 
Minimizing  over  a  subspace  in  that  case  involves  computations  with  the  reduced  Hessian 
approximation  Bk  =  BkZ,  where  Z  is  an  nxt  matrix  whose  columns  are  unit  vectors. 
Thus  the  subspace  problem  is  of  size  i. 

To  express  Bk  we  use  (3.22)  to  obtain 


[  Yk  akSk  1 

-  -or  dum' 

-1 

2  0  ' 

-1 

'  U  ■ 

L  J 

®  Jk 

~LkD-k1/2  Jk  \ 

.  akSj  _ 

where  I  =  ZT  Z  is  the  identity  matrix  of  size  t,  and  Yk  =  ZTYk  and  Sk  =  ZTSk  are  txm 
submatrices  of  Yk  and  Sk.  The  procedure  of  multiplying  the  reduced  Hessian  Bk  by  an 
arbitrary  t- vector  v  is  similar  to  steps  5  to  7  of  Algorithm  3.2  and  costs  (4m  +  l)i+0{m2) 
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multiplications.  Similarly,  the  weighted  scalar  product  vT Bkv  costs  (2m  +  1  )t  +  0(m2) 
multiplications. 

In  this  case  we  see  significant  reductions  in  computational  cost,  resulting  in  work 
proportional  to  t  rather  than  to  n. 


4.  Alternative  Formulae. 

For  the  sake  of  completeness  we  now  review  two  other  known  approaches  for  handling 
limited  memory  matrices.  The  first  approach  exploits  the  symmetry  and  structure  of 
(1-4),  giving  rise  to  an  efficient  two-loop  recursion  for  computing  products  using  the 
inverse  Hessian  approximation.  The  second  approach  is  for  the  direct  BFGS  update  and 
consists  of  a  straightforward  sequence  of  multiplications. 


4.1.  The  Two-Loop  Recursion 

The  following  recursive  formula  computes  the  step  direction  Hkgk  for  unconstrained 
minimization.  It  is  given  in  Nocedal  (1980)  and  is  based  on  the  recursion  developed  by 
Matthies  and  Strang  (1979)  for  the  standard  BFGS  method.  As  before,  Hk  represents  a 
limited  memory  BFGS  approximation  of  the  inverse  Hessian.  It  is  obtained  by  applying 
m  updates  to  a  basic  matrix  Hk  ^  using  the  m  most  recent  correction  pairs,  which  we 
label  for  simplicity  («0,  y0), (^m_a,  2/m_i). 


i-  q  =  9k 


2. 


’or  i  =  m  —  1, . . . ,  0 
cti  =  pisf  q  (store  a,-) 

q  :=  q  - 


3.  r  =  H(k0)q 


4. 


br  i  =  0, 1, . . .,  m  -  1 

P  =  PivI r 

r  :=  r  +  ^-(a,-  -  ft) 


5.  Hkgk  =  r 


Excluding  step  3,  this  algorithm  requires  4 mn  multiplications;  if  is  diagonal  then 
n  additional  multiplications  are  needed.  W hen  used  for  unconstrained  minimization 
the  computation  and  storage  cost  is  thus  essentially  the  same  as  using  formula  (2.6) 
implemented  as  described  in  Section  3.1,  as  long  as  is  a  scalar  multiple  of  the 
identity.  However,  the  two  loop  recursion  has  the  advantage  that  the  multiplication  by 
the  basic  matrix  H |0)  is  isolated  form  the  rest  of  the  computations.  As  a  result  the  two- 
loop  recursion  will  be  less  expensive  than  (2.6)  in  the  case  when  differs  from 
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In  a  typical  iteration  k ,  the  matrix  Bk  is  obtained  by  updating  a  starting  matrix  B ^ 
m  times  using  the  m  most  recent  pairs,  which  we  denote  for  simplicity, 

(^0?  2/0)5  •  •  •  5  (^m  — 1 1  Urn  —  1  )• 

From  (4.1)  we  see  that  Bk  can  be  written  as 


m  — 1 

Bk  =  -  a.af], 

1=0 

where  the  vectors  a;,  6*  can  be  computed  by  means  of  the  following  formula: 
For  k  =  0, 1, . . . ,  m  —  1 
1. 

bk  =  Vk/iVkSk)* 

2. 

k- 1 

aA:  =  B^Sk  +  [(6fsjfc)6;  -  (flf  Sk)di 

i- 0 

3. 

Ot  :=  ak/(sla, t)5. 


(4.2) 


(4.3) 

(4.4) 

(4.5) 


At  the  next  iteration  we  repeat  this  process,  except  that  the  pair  (50,2/0)  is  replaced 
by  the  new  pair  The  vectors  a*  need  to  be  recomputed  form  scratch  since  they 

all  depend  on  the  deleted  pair  (s0,y0).  On  the  other  hand,  the  vectors  b{  and  the  inner 
products  bj sk  can  be  saved  from  the  previous  iteration,  and  only  the  new  values  bm  and 
bj sm  need  to  be  computed.  Taking  this  into  account,  and  assuming  that  B^  =  I  we 
find  that  approximately 

3/2  m2n  multiplications, 

are  needed  to  determine  the  limited  memory  matrix. 

To  compute  Bmv,  for  some  vector  v  <E  Rn ,  using  (4.2)  requires  4 mn  multiplications. 
This  approach  is  therefore  less  efficient  than  that  based  on  the  compact  matrix  represen¬ 
tation  described  in  section  3.2.  Indeed,  whereas  the  product  Bkv  costs  the  same  in  both 
cases,  updating  the  representation  of  the  limited  memory  matrix  using  the  compact  form 
requires  only  2 mn  multiplications,  compared  to  3/2  m2n  multiplications  needed  by  the 
approach  described  in  this  section. 


5.  Compact  Representation  of  SRI  Matrices. 

In  this  section  we  develop  compact  representations  of  matrices  generated  by  the  sym¬ 
metric  rank-one  (SRI)  formula.  These  representations  are  similar  to  the  ones  derived  for 
the  BFGS  formula,  but  under  some  conditions  require  less  storage. 


19 


by  more  than  a  simple  scalar  multiplication,  since  the  entire  matrix  Y£ H^k)Yk  would 
then  have  to  be  updated. 

However,  the  two-loop  recursion  cannot  be  efficiently  adapted  for  sparse  projections. 
Let  us  consider  for  example  the  product  Hke{,  which  can  be  obtained  by  replacing  g k 
with  c,-  in  the  two-loop  recursion.  Since  the  vectors  st-  and  yi  are  in  general  not  sparse, 
we  see  that  only  the  computation  of  am_i  in  step  2  results  in  savings.  Thus  steps  2  and 
4  require  (4m  -  l)n  multiplications  -  almost  the  same  as  in  the  dense  case. 

We  should  also  mention  that  while  the  compact  form  (2.6)  has  an  analog  (2.17)  for 
the  direct  update,  we  know  of  no  procedure  analogous  to  the  two  loop  recursion  that  can 
compute  the  direct  update  from  B^\sk,  and  Yk  in  0(mn )  operations. 

Mathematically,  the  relation  of  the  two-loop  recursion  to  (2.6)  can  be  seen  if  we  note 
that  (2.6)  can  be  expressed 

Hk  =  (J  -  SkR~kTYT)n£\l  -  YkR~k'Sl)  +  SkRf  DkR?  STk  . 

The  vector  made  up  of  the  coefficients  at  can  then  be  seen  to  be  R^Sj^gk,  and  the  final 
value  of  the  vector  q  is  (/  —  YkR^  1  S^)gk.  Note  that  in  the  two-loop  procedure  everything 
is  computed  afresh  at  each  iteration,  thus  making  it  easier  to  change  parameters  such  as 
H I  \  while  implementing  (2.6)  involves  saving  and  updating  more  computed  quantities, 
thus  making  information  such  as  sparse  projections  of  H  more  immediately  accessible. 

A  close  examination  of  the  two-loop  recursion  reveals  that  it  is  similar  in  structure 
to  computations  of  gradients  by  means  of  the  adjoint  method  (or  the  reverse  mode  of 
automatic  differentiation  (Griewank  (1989)).  In  fact  Gilbert  and  Nocedal  (1991)  show 
that  there  is  a  precise  relationship  between  these  two  algorithms:  the  two-loop  recursion 
can  be  obtained  by  applying  the  adjoint  method  to  compute  the  gradient  of  the  function 
Kq)  -  \gTHkg  with  respect  to  its  argument  g,  where  Hk  is  the  limited  memory  BFGS 
matrix.  The  scalars  which  are  saved  during  the  first  loop,  correspond  to  the  quantities 
referred  to  as  the  adjoint  variables  in  the  optimal  control  literature. 


4.2.  A  Straightforward  Approach. 

The  direct  BFGS  formula  (2.16)  can  be  written  as 


Bk+ i  =  Bk  -  akal  +  bkb% , 


(4.1) 


where 


ak 


Bks  k 


(*; iBksl)* 


A  straightforward  implementation  of  the  limited  memory  method  consists  of  saving  these 
intermediate  vectors  cq  and  b{  to  define  the  iteration  matrix.  It  has  been  used  by  several 
authors  including  Mahidhara  and  Lasdon  (1990). 
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The  SRI  update  formula  is  given  by 


Bk+i  =  Bk  + 


(yk  Bk3k)(Dk  BkSk) 
( Vk  BkSk)TSk 


(5.1) 


see  for  example  Fletcher  (1987).  Note  that  this  update  is  well  defined  only  if  the  de¬ 
nominator  ( Bksk  —  yk)T Sk  is  nonzero.  In  recent  implementations  of  the  SRI  method,  the 
update  is  simply  skipped  if  this  denominator  is  very  small  relative  to  j|s&||||l?jfeSfc  —  yk\\ 
(Conn,  Gould  and  Toint  (1988),  Khalfan,  Byrd  and  Schnabel  (1990)).  Since  the  SRI 
update  does  not  have  the  property  of  hereditary  positive  definiteness,  there  is  no  reason 
to  enforce  the  curvature  condition  sjyk  >  0  as  with  BFGS  updating,  and  we  will  thus 
consider  a  sequence  of  updates  to  an  arbitrary  matrix  Bq  subject  only  to  the  assumption 
that  the  update  is  well  defined. 


Theorem  5.1  Let  the  symmetric  matrix  B0  be  updated  k  times  by  means  of  the  SRI 
formula  (5.1)  using  the  pairs  {si,yi}^~Q,  and  assume  that  each  update  is  well  defined, 
i.e.  sJ(BjSj  —  yj)  ^  0 :j  =  0 1.  Then  the  resulting  matrix  Bk  is  given  by 

Bk  =  B0  +  (Yk  -  B0Sk)(Dk  YLk  +  Ll-  SlB0Sk)-\Yk  -  B0Sk)T ,  (5.2) 

where  Sk,Yk,Dk)  and  Lk  are  as  defined  in  (2.1),  (2.7)  and  (2.18),  and  the  matrix  Mk  = 
(Dk  *f  Lk  +  L 'k  —  Sk  B0Sk)  is  nonsingular. 


Proof.  We  proceed  by  induction.  When  k  =  1  the  right  hand  side  of  (5.2)  is 


Bo  +  iVo  BoSo\y0- BoSoV sfyo  BoS° 

Let  us  now 

assume  that  (5.2)  holds  for  some  k.  Define 

Qk  -  [(Zo,  •  •  • ,  qk-i]  =  Yk  -  B0Sk , 

(5.3) 

and 

Mk  =  Dk  +  Lk  +  L 'k  —  SlB0Sk. 

(5.4) 

Therefore 

Bk  =  B0  +  QkMklQTk. 

Applying  the  SRI  update  (5.1)  to  Bk  we  have 


d  I  q  *  r-\QT  ,  (Vk  Bosk  QkMk  Qk  sk)(yk  Bosk  QkMk  1Q])sk 
*  (Vk-  B0sk)Tsk-slQkM^Qlsk 

B0  +  QkMpQl  +  Ar.  9 L: ‘ Wk ) ip  zf k: 

(lk  Sk  ~  ™k  Mk  Wk 

Bo  +  [qkqk  -  qk(wTkMkl)QTk  -  Qk(Mklwk)qk 
+Qk(bkMkl  +  MklwkwlMkl)QTk]  /8k, 
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where  we  have  defined 


wk  =  Qksk, 


(5.5) 


and  where  the  denominator 


h  =  qlsk-wlMklwk 
C  2/A:  Bk$k) 

is  non-zero  by  assumption.  We  may  express  this  as 

1 


(5.6) 


Bk+i  —  Bq  +  —  [Qk  qk] 

Ok 


Mk  1(SkI  +  wkwl  Mk  1 )  -Mk  1  wk 
~wlMkl  1 


- 

'  Ql' 

.  ik  _ 

(5.7) 


Now,  from  definitions  (5.3),  (5.4)  and  (5.5)  we  see  that  the  new  matrix  Mk+i  is  given  by 

Mk  Wk 


Mk+ 1 


T  T 
Wk  %  Sk 


and  by  direct  multiphcation,  using  (5.3),  (5.5)  and  (5.6),  we  see  that 


'  Mk 

Wk 

T 

[  Wk 

Vksk  . 

Mk1(6kI  +  WkW^Mb1)  -Mklwk 

1 


T  =  /. 

Ok 


(5.8) 


Therefore  Mk+i  is  invertible,  with  Mk^  given  by  the  second  matrix  in  (5.8),  but  this 
is  the  matrix  appearing  in  (5.7).  Thus,  we  see  that  (5.7)  is  equivalent  to  equation  (5.2) 
with  k  replaced  by  k  +  1,  which  observation  establishes  the  result. 

□ 

Since  the  SRI  method  is  self  dual,  the  inverse  formula  can  be  obtained  simply  by 
replacing  B,s,y  by  H,y,s  respectively  (see  Dennis  and  Schnabel  (1983)).  Alternatively, 
if  Bk  is  invertible,  application  of  the  Sherman- Morrison- Woodbury  formula  to  (5.2)  shows 
the  inverse  of  Bk  is  given  by 


Hk  =  H0  +  (Sk  -  H0Yk)(Rk  +  Rl-Dk-  Y? H0Yk)-\Sk  -  H0Yk)T.  (5.9) 

However,  in  the  context  of  unconstrained  optimization,  since  the  SRI  update  is  not 
always  positive  definite  this  formula  is  not  as  likely  to  be  useful  in  step  computation  as 
is  the  inverse  BFGS  update. 

It  should  be  clear  how  to  develop  limited  memory  SRI  methods.  In  (5.2)  we  replace 
B0  with  the  basic  matrix  at  the  k- th  iteration,  which  we  denoted  earlier  by  B^\  and  Sk 
and  Yk  should  now  contain  the  m  most  recent  corrections,  as  in  (3.2).  Savings  in  storage 
can  be  achieved  if  Bk  ^  is  kept  fixed  for  all  k ,  for  in  this  case  the  only  n- vectors  one  needs 
to  store  are  the  m  columns  of  Qk~  This  would  result  also  in  some  savings  in  the  cost 
of  updating  the  matrix  Mkl  depending  on  the  step  computation  strategy  used.  On  the 
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other  hand,  if  B j.  ^  is  a  scalar  multiple  of  the  identity  and,  as  is  often  the  case,  one  wants 
to  change  the  scalar  at  each  iteration,  then  both  Sk  and  Yk  must  be  stored  separately, 
and  the  storage  and  updating  costs  of  the  limited  memory  SRI  and  BFGS  methods  are 
similar. 

We  will  not  give  detailed  algorithms  for  computing  products  involving  limited  memory 
SRI  matrices  because  the  ideas  are  very  similar  to  those  described  in  the  previous  section. 
One  point,  however,  that  is  worth  discussing  is  how  to  compute  the  denominator  in  (5.1), 
at  each  stage  of  the  limited  memory  updating,  to  determine  if  the  update  should  be 
skipped.  The  condition 

sj  (BjSj  —  yj)  0,  j  —  0,  ...,&—  1,  (5.10) 

can  be  expensive  to  test.  Note  however  that  (5.10)  is  equivalent  to  the  nonsingularity 
of  the  principal  minors  of  Mk.  Thus,  when  using  the  form  (5.2)  in  a  limited  memory 
method,  the  condition  (5.10)  could  be  tested  when  computing  a  triangular  factorization 
of  Mk  without  pivoting,  with  the  test  for  a  zero  on  the  diagonal  of  the  factor  being  made 
relative  to  the  magnitude  of  Qk  and  Sk.  Skipping  an  update  would  correspond  to  deleting 
the  corresponding  row  and  column  of  Mk. 


6.  Representation  of  Broyden  Matrices  for  Nonlinear  Equations. 

A  widely  used  secant  approximation  to  the  Jacobian  matrix  of  a  system  of  nonlinear 
equations, 

F(x)  =  0,  F  :  &n  — >  r\ 
is  the  Broyden  update  (Broyden  (1965)), 

Ak+1  =  Ak  +  hir-.fr6*)'5?, 

sis>= 

Here  sk  =  xk+\  -xk,  yk  -  F(xk+i)-  F(xk),  and  Ak  is  the  approximation  to  the  Jacobian 
°f  F.  In  this  section  we  describe  compact  expressions  of  Broyden  matrices  that  are  similar 
to  those  given  for  BFGS  and  SRI.  As  before,  we  define 

Sk  =  [5o,  <  •  •  5  sk-i]  >  Yk  —  [yo,  •  •  • ,  yk- 1] ,  (5-3) 

and  we  assume  that  the  vectors  Si  are  non-zero. 


(6.1) 

(6.2) 


Theorem  6.1  Let  A0  be  a  nonsingular  starting  matrix,  and  let  Ak  be  obtained  by  up¬ 
dating  A0  k  times  using  Broyden’s  formula  (6.2)  and  the  pairs  {s{,yi}-~Q.  Then 

Ak  =  A0  +  (Yk  -  AoS^N^Sl,  (6.4) 

where  Nk  is  the  k  x  k  matrix 


(Fk)ij 


ifi<j 

0  otherwise 


(6.5) 
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Proof.  It  is  easy  to  show  (using  induction)  that  Ak  can  be  written  as 


Ak  —  Bk  -f  Ck, 

where  B k  and  Ck  are  defined  recursively  by 

f  Bo  =  Aq 

\  Bk+i  =  Bk(I  -  pkSksl)  Vk  >  0, 


and 


and  where 


Co  =  0 

Ck+1  =  Ck(I  -  pkSkSk)  +  pkVkSk  >  0, 

Pk  =  l/slsk- 


(6.6) 


(6.7) 


(6.8) 


Considering  first  B r-  we  note  that  it  can  be  expressed  as  the  product  of  B0  with  a  sequence 
of  projection  matrices, 


Bk  =  B0(I  -  posos^)  •  •  •(/ -  pk-isk-isl^).  (6.9) 

Now  we  apply  Lemma  2.1,  with  y  :=  s  in  the  definition  (1.3),  to  this  product  of  projections 
to  yield  the  relation 

Bk  =  A0  -  AoSkN^Sk,  (6.10) 

for  all  k  >  1. 

Next  we  show  by  induction  that  Ck  has  the  compact  representation 


Ck  =  YkN^lSk  .  (6.11) 

By  the  definition  (6.8),  we  have  that  C\  =  VoPoSq  ,  which  agrees  with  (6.11)  for  k  =  1. 
Assume  now  that  (6.11)  holds  for  k.  Then  by  (6.8), 


Ck+i 


YkN^SUl-PkSksD  +  PkVksl 

PkYkN^Slsksl  +  PkVk4 

PkN^S^Sk  _ 

0  0 


YkN^S? 

Yk 


Yk+i 


Vk 


* k 1 


^k1 


0 


Note,  however,  that 


pkNr'sIsk 


k 

pk 


Sk+ 1  • 


Si 


+ 


Yk  yk 


0  0 
o  Pk 


SI 

„T 


(6.12) 


^k1  -PkNZ'Si’sk 

0  pk 


-1  cT. 


'  Nk 

Sksk 

0 

_1_ 

Pk 

=  L 


(6.13) 
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which  implies  that  the  second  matrix  on  the  right  hand  side  of  (6.12)  is  By 

induction  this  establishes  (6.11).  Finally,  substituting  (6.10)  and  (6.11)  in  (6.6),  we 
obtain  (6.4). 

□ 


We  now  derive  a  compact  representation  of  the  inverse  Broyden  update  which  is  given 
by 


A-1 

/ik+ 1 


Ar  1  + 


(sk  -  Aklyk)slA\ 
slAklVk 


(see  for  example  Dennis  and  Schnabel  (1983)). 


(6.14) 


Theorem  6.2  Let  A0  1  be  a  nonsingular  starting  matrix,  and  let  Akl  be  obtained  by 
updating  Aq 1  k  times  using  the  inverse  Broyden  formula  (6. If)  and  the  pairs  { s^yA^Z 
Then 

Ak%  =  V  "  (V**  -  Sk)(Mk  +  SfA"1^)-1^-1,  (6.15) 

where  Sk  and  Yk  are  given  by  (6.3)  and  Alk  is  the  k  x  k  matrix 


( Mk)i,j 


ifi>j 

0  otherwise 


(6.16) 


Proof.  Let 


so  that  (6.4)  becomes 


U  =  Yk  -  A0Sk , 


=  NpSl 


Ak  =  A0  +  UVT. 

Applying  the  Sherman-Morrison-Woodbury  formula  we  obtain 


Ap  =  Ao1 -Ao1U(I  +  VtAo1U)-1VtAo1 

=  Ao1  -  AZ\Yk  -  A0Sk)(I  +  NpSlAZ\Yk  -  A0Sk))~lN^ Sj A^ 

=  -V  -  (Vn  -  St)(Nk  +  5jA0-1yl.  -  Sk  S  k)~l  Sk  Aq1  . 

By  (6.5)  and  (6.16)  we  have  that  Nk  -  S%Sk  =  Mk,  which  gives  (6.15). 

□ 

Note  that  since  we  have  assumed  that  all  the  updates  given  by  (6.14)  exist,  we  have 
implicitly  assumed  the  nonsingularity  of  Ak.  This  nonsingularity  along  with  the  Sherman  - 
Morrison  formula  ensures  that  (Mk  +  Sj A^Yk)  is  nonsingular. 

These  representations  of  Broyden  matrices  have  been  used  by  Biegler,  Nocedal  and 
Schmid  (1992)  to  approximate  a  portion  of  the  Hessian  of  the  Lagrangian  in  a  successive 
quadratic  programming  method  for  constrained  optimization. 
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7.  Relation  to  Multiple  Secant  Updates. 

There  is  a  close  algebraic  correspondence,  and  in  certain  special  cases  an  equivalence, 
between  the  representations  of  a  sequence  of  quasi-Newton  updates  that  have  been  dis¬ 
cussed  in  this  paper,  and  multiple  secant  updates  that  have  previously  been  discussed 
by  several  authors  including  Barnes  (1965),  Gay  and  Schnabel  (1978),  Schnabel  (1983), 
and  Khalfan  (1989).  In  this  section  we  briefly  discuss  this  correspondence,  for  the  BFGS, 
SRI,  and  Broyden  updates.  We  also  make  a  few  comments  about  the  tradeoffs  between 
using  these  two  types  of  updates.  In  additional  to  the  notation  of  the  preceding  sections, 
we  use  the  notation  that  Rk  is  the  k  X  k  matrix  that  is  the  strict  upper  triangle  of  S^Yk, 
i.e.  Rk  =  Rk  -  Dk  where  Rk  and  Dk  are  defined  by  (2.3)  and  (2.7).  Thus 

SkYk  =  Lk  +  Dk  -f  Rk  (7-1) 


where  Lk  is  defined  in  (2.18). 

Multiple  secant  updates  are  updates  that  enforce  the  last  k  secant  equations,  i.e. 
in  the  notation  of  Section  1  BkSk  =  Yk  or  HkYk  =  Sk.  While  the  papers  mentioned 
above  generally  consider  using  multiple  secant  update  to  update  Bk  to  Bk+ 1,  analogous 
updates  to  those  considered  in  this  paper  would  arise  from  using  multiple  secant  updates 
to  update  Bq  to  Bk  or  Hq  to  Hk.  This  is  the  context  in  which  we  consider  multiple  secant 
updates  in  this  section. 

In  this  context,  the  multiple  secant  version  of  the  direct  BFGS  update  applied  to  B0 
is  given  by 

Bk  =  Bo  +  Yk(Y? Sk)-'Y?  -  BaSk(SlB0Sk)~l SlB0  (7.2) 

or  using  a  representation  analogous  to  (2.17), 


Bk  =  Bq  —  BoSk  Yk 


SlB0Sk 

0 

-1 

'  Sf  Bo  ' 

0 

~YkTSk 

y I  \ 

(7.3) 


(assuming  k  <  n).  The  matrix  Bk  given  by  (7.2)  always  obeys  the  k  secant  equations 
BkSk  =  Yk.  Schnabel  (1983)  shows  that,  assuming  B0  is  symmetric  and  positive  defi¬ 
nite,  Bk  is  symmetric  if  and  only  if  Y fcT Sk  is  symmetric,  and  in  addition  Bk  is  positive 
definite  if  and  only  if  Yj[ Sk  is  positive  definite.  These  conditions  are  satisfied  if  f(x)  is  a 
positive  definite  quadratic,  but  not  in  general  otherwise.  Schnabel  (1983)  discusses  ways 
to  perturb  Yk  to  Yk  so  that  Y? Sk  is  symmetric  and  positive  definite,  at  the  cost  of  no 
longer  exactly  satisfying  the  original  secant  equations  other  than  the  most  recent.  These 
perturbations  have  some  relation  to  the  comparisons  of  this  section,  and  we  will  return 
to  them  shortly. 

By  comparing  the  multiple  secant  update  (7.3)  and  the  representation  for  k  consecu¬ 
tive,  standard  BFGS  updates  (2.17),  it  is  clear  that  these  two  formulas  are  very  similar 
algebraically.  It  is  also  immediate  that  if  Y£ Sk  =  Dk,  the  multiple  BFGS  update  to  B0 
is  equivalent  to  performing  k  standard  BFGS  updates.  This  condition,  which  means  that 
sf  yj  =  0  for  all  i  ^  j,  is  satisfied  if  f(x)  is  quadratic  and  the  step  directions  are  mutually 
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conjugate,  but  not  in  general  otherwise.  In  general,  the  two  formulas  (2.17)  and  (7.3) 
result  in  different  matrices  B k. 

Identical  comments  are  true  regarding  the  BFGS  update  to  the  inverse  Hessian.  The 
inverse  form  of  the  multiple  BFGS  update  (7.3)  is 


Hk  =  Ho  +  [  Sk  H0Yk 


wkT  +  w^(YkT  H0Yk)w;T 

-w;T 


■wr1 


1 

to 

_ 1 

.  YkHo  _ 

(7.4) 


where  Wk  =  Yk  Sk.  Again,  assuming  H0  is  positive  definite,  this  matrix  is  symmetric 
and  positive  definite  if  and  only  if  Yk Sk  is  symmetric  and  positive  definite.  Again,  the 
algebraic  forms  for  (7.4)  and  (2.6)  are  very  similar,  and  by  comparing  these  equations 
and  recalling  definitions  (2.3)  and  (2.7),  it  is  immediate  that  the  updates  are  identical  if 
YkT Sk  =  Dk ,  and  in  general  are  different  otherwise. 

From  these  comparisons,  one  can  see  that  in  the  context  of  limited  memory  methods, 
the  multiple  BFGS  updates  (7.3)  or  (7.4)  would  offer  similar  algebraic  efficiencies  to 
the  representations  (2.17)  or  (2.6)  for  a  sequence  of  standard  BFGS  updates,  that  are 
discussed  in  this  paper.  The  multiple  BFGS  updates  have  the  disadvantage,  however, 
that  Bk  or  Hk  is  not  in  general  symmetric  and  positive  definite  even  if  the  condition 
sfyi  >  0,  i  =  0,  1,  that  guarantees  that  the  matrix  produced  by  k  consecutive, 

standard  BFGS  updates’ is  symmetric  and  positive  definite,  is  satisfied.  Instead,  the 
multiple  secant  updates  require  the  much  stronger  condition  that  Y£  Sk  be  symmetric 
and  positive  definite,  and  there  does  not  appear  to  be  a  practical  way  to  enforce  this 
condition  computationally.  Schnabel  (1983)  has  instead  considered  ways  to  perturb  Yk  to 
Yk  so  that  Yk  Sk  is  symmetric  and  positive  definite,  and  the  most  recent  secant  condition 
(i.e.  the  last  column  of  Yk)  is  unchanged.  In  addition,  if  the  columns  of  Sk  are  not 
strongly  linear  independent,  the  updates  (7.3)  or  (7.4)  may  be  numerical  unstable  so 
some  secant  pairs  must  be  dropped  from  Sk  and  Yk.  Due  to  the  additional  computations 
required  by  these  perturbations,  and  the  lack  of  symmetry  and  positive  definiteness  in 
the  unperturbed  multiple  secant  BFGS  update,  it  does  not  seem  advantageous  to  use  the 
multiple  secant  BFGS  update  rather  than  k  consecutive,  standard  BFGS  updates  in  the 
context  of  limited  memory  methods.  An  interesting  related  question  is  whether  there 
is  a  natural  perturbation  of  Yk  that  causes  the  multiple  secant  update  to  be  equivalent 
to  (2.17);  this  does  not  seem  to  be  the  case,  but  as  mentioned  below  the  situation  is 
different  for  the  SRI  update. 

Now  we  turn  to  the  SRI  update.  The  multiple  secant  SRI  update,  which  to  our 
knowledge  was  first  discussed  in  Schnabel  (1983),  if  applied  to  B0  is  given  by 


Bk  =  B0  +  (Yk  -  B0Sk)((Yk  -  BoSkfSky^Y,  -  B0Sk)T.  (7.5) 

Bk  given  by  (7.5)  always  obeys  the  k  secant  equations  BkSk  —  Yk.  Assuming  Bo  is 
symmetric,  Bk  is  symmetric  if  and  only  if  Y^Sk  is  symmetric,  which  is  true  if  f(x)  is 
quadratic  but  not  necessarily  otherwise.  Like  the  standard  SRI  update,  Bk  given  by 


26 


(7.5)  is  not  necessarily  positive  definite  even  if  the  necessary  conditions  for  the  standard 
BFGS  or  multiple  BFGS  update  to  be  positive  definite  are  met. 

Comparing  the  multiple  SRI  update  (7.5)  to  the  formula  (5.2)  for  k  consecutive, 
standard  SRI  updates,  it  is  clear  that  the  only  difference  between  these  two  formulae 
is  that  (7.5)  contains  the  term  Yk  Sk  as  part  of  the  middle,  inverse  expression,  instead 
of  the  symmetric  term  Dk  +  Lk  +  LTk  in  (5.2).  Recalling  that  Yjf  Sk  =  +  Dk  +  Zjf, 

it  is  immediate  that  (7.5)  and  (5.2)  are  identical  if  Rk  =  Lk,  i.e.  if  sj yj  =  sj y{  for  all 
0  <  i,j  <  k  -  1.  This  condition  is  true  for  f(x)  quadratic,  and  in  this  case  the  multiple 
SRI  update  is  the  same  as  k  consecutive,  standard  SRI  updates.  This  should  come  as 
no  surprise,  because  the  quadratic  termination  result  for  the  standard  SRI  update  also 
implies  that  the  update  preserves  all  past  secant  equations,  as  does  the  multiple  secant 
form  of  the  SRI.  Note  that  the  condition  for  the  equivalence  of  the  multiple  SRI  to 
k  consecutive,  standard  SRI  updates  is  far  milder  condition  than  the  condition  for  the 
equivalence  of  k  standard  BFGS  updates  to  the  multiple  BFGS. 

For  non-quadratic  /(z),  however,  the  standard  and  multiple  SRI  updates  will  gener¬ 
ally  be  different.  Again,  the  algebraic  costs  associated  with  using  the  updates  are  very 
similar,  while  the  multiple  SRI  has  the  disadvantage  that  it  does  not,  in  general,  preserve 
symmetry,  while  a  sequence  of  standard  SRI  updates  does.  Also,  it  is  easier  to  monitor 
stability  of  the  standard  SRI,  since  this  only  involves  considering  each  individual  term 
( Vj  ~  Bjsj)  sj  rather  than  the  matrix  (Yk  —  BoSk)TSk.  For  this  reason,  a  sequence  of 
standard  SRI  updates  would  seem  preferable  to  the  multiple  SRI  update  in  the  context 
of  limited  memory  methods.  It  is  interesting  to  note  that  if  Yk  is  perturbed  to  the  Yk 
that  one  obtains  by  multiplying  Bk  given  in  (5.2)  by  Sk,  then  the  multiple  secant  update 
becomes  identical  to  (5.2).  The  same  relationship  is  not  true  for  the  multiple  BFGS 
update. 

Finally  we  consider  the  Broyden  update  for  nonlinear  equations.  A  multiple  secant 
version  of  Broyden’s  update  has  been  considered  by  several  authors  including  Barnes 
(1965),  Gay  and  Schnabel  (1978),  and  Schnabel  (1983).  In  a  limited  context  using  the 
notation  of  Section  6,  it  is  given  by 

Ak  =  A0  +  (Yk  -  A0Sk)(Sk  Sk^1  Sk  (7.6) 

This  update  is  well  defined  as  long  as  Sk  has  full  column  rank,  and  obeys  the  k  secant 
equations  AkSk  =  Yk. 

Comparing  (7.6)  to  the  formula  (6.4)  for  k  consecutive,  standard  Broyden  updates, 
one  sees  that  the  only  difference  is  in  the  matrix  in  the  middle  of  the  formula  that  is 
inverted.  In  the  multiple  secant  update  it  is  Sj Sk,  while  in  (6.4)  it  is  the  upper  triangular 
portion  of  this  matrix,  including  the  main  diagonal.  Therefore,  the  two  updates  are  the 
same  if  the  directions  in  Sk  are  orthogonal.  The  preference  between  these  two  formulas 
does  not  appear  to  be  clear  cut.  The  formula  (6.4)  has  the  advantage  that  it  is  well  defined 
for  any  Sk,  while  (7.6)  is  only  well  defined  numerically  if  the  k  step  directions  that  make 
up  Sk  are  sufficiently  linearly  independent.  (If  they  are  not,  only  some  subset  of  them  can 
be  utilized  in  a  numerical  implementation  of  the  multiple  Broyden  method;  this  is  the 
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approach  that  has  been  taken  in  implementations  of  this  update.)  On  the  other  hand, 
(7.6)  always  enforces  the  k  prior  secant  equations  while  (6.4)  generally  only  enforces 
the  most  recent  equation.  Thus  it  would  probably  be  worthwhile  considering  either 
method  (or  their  inverse  formulations)  in  a  limited  memory  method  for  solving  nonlinear 
equations.  Note  that  the  key  difference  between  this  comparison  and  the  preceding 
comparisons  of  the  BFGS  and  SRI  based  formulae  is  that  symmetry,  which  in  general 
is  inconsistent  with  satisfying  multiple  secant  equations,  is  not  a  factor  in  the  nonlinear 
equations  case  but  is  a  factor  for  updates  for  optimization  problems. 
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